Maps are especially useful as navigational aids - with relevant, accurate and complete location data, people can find the most efficient route from A to B.
Exploring the Hampshire countryside by bike, our development team got to thinking: could we develop a tool to work out the optimal bike route between two places? We prefer riding on quieter roads, which are often more scenic, and are willing to ride a bit further to experience the scenery and avoid busy roads.
In this tutorial, we will us the OS Features API Highways_RoadLink feature type and the NetworkX Python package to create a graph of the road network in an area in Hampshire. Then we'll develop a way to weight each road segment based on length, elevation gained / lost and the "form of way", which refers to road size.
We'll use these weights to calculate the weighted shortest path between points: a cycling route optimized for our preferences. By incorporating this weighting function, users can alter their preferred path based on the attributes of road segments en route. For example, a cyclist might want to stick to very small roads, so larger roads will have a higher weight - or commercial trucking operators might want to avoid steep hills, or only travel on large roads.
⚠️⚠️⚠️ It's important to note that this tool is for learning purposes only, and should not be used as a routing system. Other factors might contribute to the safety of these routes!
We use a few open source python packages in our code, and connect to two APIs accessible via the OS Data Hub.
Geopandas: "an open source project to make working with geospatial data in python easier". Geopandas adds geospatial functionality to the popular Pandas data analysis library. If you're not familiar, read 10 minutes to pandas.
NetworkX "is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks." A network is a set of nodes, connected by links (or edges). The OS Features API provides Highways_RoadLink data in the form of GeoJSON linestrings. We can build a network from this since the IDs of the nodes connected by the link linestring is stored in the properties of the feature:
{
"type": "Feature",
"geometry": {
"coordinates": [
[ -0.887902,50.994843 ],
[ -0.887922, 50.995378 ],
[ -0.887996, 50.996639 ],
[ -0.887992, 50.996783 ]
],
"type": "LineString"
},
"properties": {
...
"StartNode": "osgb4000000023147283",
"EndNode": "osgb4000000023147284",
...
}
}
Folium allows us to create Leaflet maps in our IPython Notebook, so we can visualise the results of our analysis.
And the OS Python API Wrapper is a new tool we've launched to make working with OS Data Hub APIs a breeze for Python developers. At this point, it is a prototype tool - so if you catch any bugs or want to contribute a feature, open an issue on our Github.
We'll also use matplotlib for some quick visualisation and standard Python libraries like os, sys and json:
import os
import sys
import json
import glob
import folium
import pandas as pd
import geopandas as gpd
import networkx as nx
from datetime import datetime
import matplotlib.pyplot as plt
from os_paw.wfs_api import WFS_API
from folium.plugins import FloatImage
print('=> Imported libraries')
Location data is fetched from OS Data Hub APIs. These APIs are designed to give developers and data scientists access to current, high resolution location data on demand. If you don't already have one, go sign up for an account on the OS Data Hub.
Our analysis here uses Premium data from the OS Features API, so you will need a Premium account. But - you can get up to £1000 of OS data free of charge each month. Furthermore, you can set your API key to "Development Mode" - transactions submitted with that key will not be charged.
For this to work properly, create a project on a Premium OS Data Hub account and enable the OS Maps API and OS Features API.
Paste your key below 👇
# Set API key, copied from osdatahub.os.uk.
# (This tutorial requires a project key with the OS Features
# and OS Maps APIs enabled from a Premium account!)
key = 'JydUr1HO7ejqBhw0YP19W3b1GonFwmzr'
And create an instance of the OS Python API Wrapper:
# Create WFS API instance
wfs_api = WFS_API(api_key=key)
Next we'll fetch data within a bounding box using the get_all_features_within_bbox method. We'll need to specify which feature type we'll want features from - for a full list see the docs. (Note how some feature types are Premium and some are Open Data.)
We will need to specify certain arguments the wrapper needs to have to construct a valid request, including:
EPSG:4326 or EPSG:27700)"lower left latitude, lower left longitude, upper right latitude, upper right longitude")allow_premium flag to TrueNote that currently the Python API Wrapper is configured to limit requests to 1000 features, or 10 transactions of 100 features each, by default. This is to prevent users from unintentionally racking up large bills by submitting a request that loops through hundreds of Premium transactions - but it can be overridden by setting max_feature_count equal to whatever number you'd like to request. Careful here - be sure you're in Development Mode while testing this, or you might accidentally run up costs!
# Fetch geometries from Highways_Roadlink feature type
# ☝️ Remember to switch your API key to "Development Mode"
# while you're experimenting as this is a Premium dataset!
data = wfs_api.get_all_features_within_bbox(type_name="Highways_RoadLink",
bbox="50.958859,-0.831871, 51.025201,-0.6667327",
srs='EPSG:4326',
allow_premium=True,
max_feature_count=2000
)
# There are fewer than 1000 Highway_RoadLink features in the bounding box we're
# providing - if you change that you may need to add max_feature_count=5000 or more.
Now that we've got a GeoJSON FeatureCollection assigned to the data variable, we are ready to create a geopandas GeoDataFrame. Geopandas plays well with GeoJSON, so this is quite simple. We will also cast a few of the columns to_numeric data types, so they behave as we expect when we start doing arithmetic on them.
We'll call the variable gdf - short for GeoDataFrame, a common convention in the (geo)pandas community.
# Create GeoDataFrame
gdf = gpd.GeoDataFrame.from_features(data, crs=data['crs'])
# Convert GeoDataFrame column data types
gdf['Length'] = pd.to_numeric(gdf['Length'])
gdf['ElevationGainInDir'] = pd.to_numeric(gdf['ElevationGainInDir'])
gdf['ElevationGainInOppDir'] = pd.to_numeric(gdf['ElevationGainInOppDir'])
Let's look at the first few rows we've loaded into the dataframe.
Notice how each row represents a feature, the linestring geometry is represented in the geometry column, and each attribute is represented in its own column - ID, RoadClassification, StartNode etc - tidy data!
gdf.describe()
# Display head of GeoDataFrame
gdf.head(5)
We can also visualise a rudimentary plot of the dataframe. We'll build a more sophisticated map later on - this just shows the shape of the road network in this area.
ax = gdf.plot(color='#ff1f5b', figsize=(15, 15))
# Turn plot axis off
ax.axis('off')
We will create a simple weighting algorithm, which we will assign as the path "weight" of a link in our graph. This weighting algorithm takes advantage of the fact that the OS Features API includes rich attribution with each geographic feature.
For our use case - finding direct and low-traffic cycling routes - we will emphasize certain factors. Specifically, our weighting algorithm will integrate dimensions including:
| Dimension | Affect on weight |
|---|---|
RouteHierarchy |
Smaller roads have lower weight |
Length |
Longer roads have greater weight |
ElevationGainInDir |
Greater values have greater weight |
ElevationGainInOppDir |
Greater values have greater weight |
roadWeight = {
'Restricted Local Access Road': 0,
'Minor Road': 10,
'Local Road': 20,
'A Road': 100,
'A Road Primary': 150,
'Restricted Secondary Access Road': 0,
'Local Access Road': 0,
'Secondary Access Road': 0
}
def cyclingWeight(row):
weight = 0
weight += roadWeight[row['RouteHierarchy']]
weight += row['Length'] / 1000
weight += row['ElevationGainInDir'] / 10
weight += row['ElevationGainInOppDir'] / 10
return weight
The pandas apply method applies a function to every row or column in a dataframe - we'll specify we want it applied to rows by passing in axis=1 and assign the result of the weighting function to a cell in a new column, weight.
gdf['weight'] = gdf.apply(cyclingWeight, axis=1)
# passing in the cyclingWeight function defined above
# Display subset of GeoDataFrame containing modelled link weights
gdf[['ID', 'StartNode', 'EndNode', 'Length', 'ElevationGainInDir', 'ElevationGainInOppDir', 'weight']].head()
Now we're ready to create a graph that will enable us to analyse the network. A lot of the complexity of this process is abstracted away by the NetworkX library. We'll make sure we have the actual Length, and the adjusted weight, available to the graph so we can compare routes made to optimise for each metric.
# Create NetworkX Graph
G = nx.from_pandas_edgelist(gdf, 'StartNode', 'EndNode', ['weight', 'Length'])
# Display Graph info
print(nx.info(G))
Now we need to analyse the graph we've built to find the shortest path between to nodes on the network. We're going to use Dijkstra's algorithm. The output of the pathfinding algorithm will be a list of the nodes you would pass through to get from start to end taking the shortest route through the network.
Here it becomes clear why we used weight and Length - the relative "distances" between nodes will affect the resulting path. Since we're weighting the larger roads more heavily, in our graph they'll seem "longer" than their physical distance - this means that going slightly out of the way on smaller roads will appear to the algorithm as being a shorter route. Put another way, a 1km length or dual carriageway might have a length of 1000 and a weight of 2000, while a 1km length of single track road might have a length of 1000 and a weight of 1100.
We'll create two routes through the network: one that is the shortest length, and one that is the lowest weight. With this we'll be able to see if and how our weighting function affects the route.
# Work out shortest path from nodes chosen at random
startNode = list(G.nodes())[0] # the start node's TOID
endNode = list(G.nodes())[201] # the end node's TOID
# First, the shortest path based on the length of each connecting link
dijkstraLengthNodes = nx.dijkstra_path(G, startNode, endNode, 'Length')
# Then, the shortest path based on the weight dimension
dijkstraWeightNodes = nx.dijkstra_path(G, startNode, endNode, 'weight')
These two variables - dijkstraLengthNodes and dijkstraWeightNodes - are lists containing the TOIDs of the nodes the shortest paths pass through to get from the start to the end of our desired route.
We'll use these lists of TOIDs to create a mask, which is a series of boolean values - True if the row is included and False if it isn't. We're adding columns to gdf, one for the dijkstraWeightMask and one for the dijktraLengthMask.
# Create a mask to easily select shortest path route segments
gdf['dijkstraLengthMask'] = gdf['StartNode'].isin(dijkstraLengthNodes) & gdf['EndNode'].isin(dijkstraLengthNodes)
gdf['dijkstraWeightMask'] = gdf['StartNode'].isin(dijkstraWeightNodes) & gdf['EndNode'].isin(dijkstraWeightNodes)
Now we've worked out the weighted shortest path through the road network fetched from the OS Features API using Geopandas and NetworkX! We'll plot results on a map, so we can see the shortest weighted path, and compare it with the path with the shortest length.
First we'll set up the OS Maps API using Folium:
# Set options and ZXY endpoints for OS Maps API:
layer = 'Light_3857'
zxy_path = 'https://api.os.uk/maps/raster/v1/zxy/{}/{{z}}/{{x}}/{{y}}.png?key={}'.format(layer, key)
# Create a new Folium map
# Ordnance Survey basemap using the OS Data Hub OS Maps API centred on the boundary centroid location
# Zoom levels 7 - 16 correspond to the open data zoom scales only
m = folium.Map(location=[50.916438, -1.397284],
min_zoom=7,
max_zoom=16,
tiles=zxy_path,
attr='Contains OS data © Crown copyright and database right {}'.format(datetime.year))
Then we'll define four styles for the linestrings we'll plot on our map: the road links, the Dijkstra shortest length path, the Dijkstra lowest weight path, and a highlighted line segment.
# Define feature style function
def roadlink(feature):
"""
Defines how GeoJSON features in a Leaflet overlay will be styled
"""
return {'fillColor': '#009ADE',
'color': '#009ADE',
'weight': 2,
'fillOpacity':.3}
# Define feature highlight function
def highlight(feature):
"""
Defines how GeoJSON features in a Leaflet overlay will be highlighted on-hover
"""
return {'weight':9,
'color':'#AF58BA'}
def dijkstraLowestWeight(feature):
"""
Defines how GeoJSON features in a Leaflet overlay will be styled
"""
return {'fillColor': '#00CD6C',
'color': '#00CD6C',
'weight': 5,
'fillOpacity':.3}
def dijkstraLowestLength(feature):
"""
Defines how GeoJSON features in a Leaflet overlay will be styled
"""
return {'fillColor': '#FF1F5B',
'color': '#FF1F5B',
'weight': 9,
'fillOpacity':.3}
Next we create each linestring by using the folium.GeoJson method, passing in the GeoDataFrame with the linestring segments, the style function and the highlight function.
We'll add each layer to the map, m.
# Define feature layer using boundary GeoJSON returned by WFS
dijkstraWeightOverlay = folium.GeoJson(gdf[gdf['dijkstraWeightMask']==True],
name='dijkstraWeight',
style_function=dijkstraLowestWeight,
highlight_function=highlight)
# Define feature layer using boundary GeoJSON returned by WFS
dijkstraLengthOverlay = folium.GeoJson(gdf[gdf['dijkstraLengthMask']==True],
name='dijkstraLength',
style_function=dijkstraLowestLength,
highlight_function=highlight)
# Define feature layer using boundary GeoJSON returned by WFS
roadlinkOverlay = folium.GeoJson(gdf,
popup=folium.GeoJsonPopup(fields=['ID', 'StartNode', 'EndNode', 'FormOfWay', 'RouteHierarchy']),
name='roadlinks',
style_function=roadlink,
highlight_function=highlight)
# Add feature layer to map
roadlinkOverlay.add_to(m)
dijkstraLengthOverlay.add_to(m)
dijkstraWeightOverlay.add_to(m)
And finally we add some attribution details to the map like the OS logo, set a bounding box so the map fits to the network we've fetched, and call m so the map gets visualised in the notebook:
# Obtain current date-time
date = datetime.now()
# OS logo image
logo_url = 'https://labs.os.uk/public/os-api-branding/v0.1.0/img/os-logo-maps.svg'
# Folium FloatImage plugin for displaying an image on the map
float_image = FloatImage(logo_url, bottom=1, left=1)
# Add OS logo image to map
float_image.add_to(m)
total_bbox = [[gdf.total_bounds[1], gdf.total_bounds[0]], [gdf.total_bounds[3], gdf.total_bounds[2]]]
# Return map object
m.fit_bounds(total_bbox)
m
There we go! We can see the shortest path in yellow and the weighted shortest path in green.
How could this be extended? What could we do to improve?